###Questions - How can we analyse genetic, molecular and phenotypic data when we have thousands of measurements per individual?

###Objectives - Understand the concept of a QTL - Run a QTL analysis with simulated data using MatrixEQTL - Understand the output and interpret the results


##Preliminaries

  1. Get the files and setup a working directory:
  1. Clone or download scripts and data from Antonio’s GitHub repository: - Download the code and data folders directly using bash (if you’re on a Mac, Linux or have e.g. gitforwindows):
pwd
cd ~/Desktop/
wget https://github.com/AntonioJBT/teaching_ICL/archive/master.zip
unzip master.zip
cd teaching_ICL-master
mkdir my_results
cd my_results
  1. Alternatively: - Open a browser and go to the GitHub repository, look for the “Clone or download” link on the right hand side and press “Download ZIP” - Go to where you saved the folder, unzip it, create a folder called ‘my_results’ and enter it
  1. Open RStudio and this script:
  1. Start the exercises :-D

###Notebook setup

We’ll first need to specify in R where we want to run the analysis from. If you followed the instructions above you can set your working directory to the same location. This is much easier to do manually from the R console. Copy and paste the following:

setwd('~/Desktop/teaching_ICL-master/my_results/')
getwd()

####Saving results

Normally we would prefer to keep code, data and results in separate directories. Here we’re using R Notebook and the knitr package to run commands, create output and produce a nice looking report file which contains code, text and results together. This R notebook let us have it all in one place, which can sometimes be very useful, particularly for exploratory analysis.

We’ll setup a separate path to a ‘results’ directory for plots that we might want to save independently though. We also already have a separate ‘data’ folder where we keep our data sets.

In a standard R script we would simply use commands such as setwd(), source(), etc. An important caveat here is that, by default, knitr uses the same directory as your .Rmd file to find any files that are necessary to run the code (e.g. any data files) and, if any output is generated, it???ll be saved in that directory as well. This is set independently for each code chunk.

To make it easier we can add an output path to make sure we save to the directory we want:

path_results <- '../my_results/'

####A note on memory in R > v3.5 in macOSX

If you errors related to “vector memory exhausted” you can try the following solution: R on MacOS Error: vector memory exhausted (limit reached?) - Stack Overflow

You can find more information here: - R: Get Environment Variables - R: Set or Unset Environment Variables - R: Memory Available for Data Storage

Remember to choose something that is suitable for the physical and virtual memory on your computer. Also, you may want to modify or delete the “.Renviron” file when you’re done (unless you are happy keeping the new memory settings), in particular if you have saved it in your home location (i.e. in “~/”, which in my case is “/Users/antoniob”).

####Load packages

You may need to install them first, e.g.:

source("https://bioconductor.org/biocLite.R")
biocLite()
biocLite('package')
library(knitr)
library(plyr) # check the tidyverse packages if you're not familiar with them yet
library(MatrixEQTL) # run regression analysis efficiently for millions to billions of tests
# library(data.table) # reads, writes, processes data frames very quickly and can handle large sets

###Brief introduction to the analysis practical

In the previous exercise we analysed candidate markers using regression association tests. Here, we’ll perform the same type of analysis for thousands of genetic variants and molecular phenotypes.

###Data

The data we’ll be using consists of (badly) simulated sets. We have only simulated a few characteristics of the data but it is not an attempt to characterise genetic or molecular markers in a serious way. The data contains single nucleotide polymorphisms and molecular data (such as gene expression or metabolite data points).

We can assume that the data has already been quality controlled and all the pre-processing steps have been carried out.


##Analysis ###Load and inspect the data

What are the names of the input files?

Name them once here and if they change you only need to change the code once. (e.g. if you get new data, start a different project using the same analysis code, etc.).

Small files are on GitHub but you can create larger ones if wanted with the simulation scripts in the code directory. See the README file in the data directory for a few more instructions.

input_gex <- '../data/cont_var_sim_data.tsv'
input_SNP <- '../data/plink.A-transpose.matrixQTL.geno'
input_covariates <- '../data/cov_sim_data.tsv'

When say covariates in this context, what are we actually referring to?

###SNP and gene positions

Here we’ll assume the data is from metabolites so we won’t include locations.

#input_snpsPos <- '../data/snpsloc.txt'
#input_genePos <- '../data/geneloc.txt'

####Output files

We’ll be saving some results, we can define the names of the files here:

output_file_name <- sprintf('%s/QTL_analysis_larger.mxqtl.tsv', path_results)
output_file_name
## [1] "../my_results//QTL_analysis_larger.mxqtl.tsv"
output_file_name.cis <- sprintf('%s/QTL_analysis_larger.mxqtl.cis.tsv', path_results)
output_file_name.cis
## [1] "../my_results//QTL_analysis_larger.mxqtl.cis.tsv"

Let’s look at the molecular data:

gex_data <- read.csv(input_gex, header = TRUE, stringsAsFactors = FALSE, sep = '\t')
gex_data[1:5, 1:5]
dim(gex_data)
## [1]  2000 10001
head(gex_data)
tail(gex_data)
str(gex_data, list.len = 10)
## 'data.frame':    2000 obs. of  10001 variables:
##  $ X      : chr  "var0" "var1" "var2" "var3" ...
##  $ per0   : num  2.05 2.14 2.11 2.14 2.09 ...
##  $ per1   : num  2.16 2.01 2.01 2.06 2.16 ...
##  $ per2   : num  2.1 2.05 2.1 2.15 2.1 ...
##  $ per3   : num  2.15 2.06 2.03 2.01 2.01 ...
##  $ per4   : num  2.05 2.02 2.02 2.17 2.04 ...
##  $ per5   : num  2.1 2.1 2.08 2.12 2.04 ...
##  $ per6   : num  2.03 2.04 2.07 2.06 2.11 ...
##  $ per7   : num  2.1 2.14 2.01 2.06 2.13 ...
##  $ per8   : num  2.05 2.01 2.13 2.16 2.14 ...
##   [list output truncated]
class(gex_data)
## [1] "data.frame"
colnames(gex_data)[1:10]
##  [1] "X"    "per0" "per1" "per2" "per3" "per4" "per5" "per6" "per7" "per8"
rownames(gex_data)[1:10]
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

Let’s plot a few of the variables just to check that our dataset is as we expect:

boxplot(gex_data[, 2:6])

boxplot(gex_data[, (ncol(gex_data)-5):ncol(gex_data)])

What are we actually plotting here? What are the columns and what are the rows?

What if we want to see how one (or more) of the measured molecules looks across all individuals?

This is a very quick look at the data, just for sanity, as we should have already made some pretty plots during the QC stage.

boxplot(t(gex_data[, -1]))


Let’s look at the genetic data:

SNP_data <- read.csv(input_SNP, header = TRUE, stringsAsFactors = FALSE, sep = '\t')
SNP_data[1:5, 1:5]
head(SNP_data)
tail(SNP_data)
str(SNP_data, list.len = 10)
## 'data.frame':    1000 obs. of  10001 variables:
##  $ SNP            : chr  "snp0" "snp1" "snp2" "snp3" ...
##  $ per0_per0      : int  2 1 2 1 2 0 1 1 1 1 ...
##  $ per1_per1      : int  0 1 1 1 0 0 2 0 2 0 ...
##  $ per2_per2      : int  1 1 1 1 1 1 1 1 1 0 ...
##  $ per3_per3      : int  0 1 2 0 1 2 2 0 2 1 ...
##  $ per4_per4      : int  1 0 2 1 1 1 1 0 1 1 ...
##  $ per5_per5      : int  1 0 2 1 0 1 1 1 1 2 ...
##  $ per6_per6      : int  1 2 0 2 1 2 2 1 1 2 ...
##  $ per7_per7      : int  0 2 1 2 0 1 1 0 0 2 ...
##  $ per8_per8      : int  2 2 2 1 1 2 0 2 1 1 ...
##   [list output truncated]
dim(SNP_data)
## [1]  1000 10001
class(SNP_data)
## [1] "data.frame"
colnames(SNP_data)[1:10]
##  [1] "SNP"       "per0_per0" "per1_per1" "per2_per2" "per3_per3" "per4_per4"
##  [7] "per5_per5" "per6_per6" "per7_per7" "per8_per8"
rownames(SNP_data)[1:10]
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

Let’s plot a few of the variables just to check quickly that our dataset is as we expect:

boxplot(SNP_data[, 2:6])

boxplot(SNP_data[, (ncol(SNP_data)-5):ncol(SNP_data)])

What is this about?

SNP data in this file looks more like count data (number of minor alleles at that particular genomic location).

Simple counts or a table might be better here:

as.data.frame(apply(SNP_data, 1, function(x) count(x))[1]) # rows

Let’s look at the covariates data:

cov_data <- read.csv(input_covariates, header = TRUE, stringsAsFactors = FALSE, sep = '\t')
cov_data[1:5, 1:5]
dim(cov_data)
## [1]    10 10001
head(cov_data)
tail(cov_data)
str(cov_data, list.len = 10)
## 'data.frame':    10 obs. of  10001 variables:
##  $ X      : chr  "var0" "var1" "var2" "var3" ...
##  $ per0   : num  2.01 2.09 2.05 2.04 2.15 ...
##  $ per1   : num  2.23 2.13 2.12 2.02 2.03 ...
##  $ per2   : num  2.06 2.16 2.23 2.04 2.02 ...
##  $ per3   : num  2.01 2.13 2.03 2.16 2.05 ...
##  $ per4   : num  2.05 2.1 2.14 2.11 2.02 ...
##  $ per5   : num  2.09 2.07 2.21 2.04 2.07 ...
##  $ per6   : num  2 2.14 2.2 2.06 2.02 ...
##  $ per7   : num  2.26 2.14 2.09 2.04 2.07 ...
##  $ per8   : num  2.06 2.13 2.03 2.02 2.12 ...
##   [list output truncated]
class(cov_data)
## [1] "data.frame"
colnames(cov_data)[1:10]
##  [1] "X"    "per0" "per1" "per2" "per3" "per4" "per5" "per6" "per7" "per8"
rownames(cov_data)[1:10]
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

What could these variables represent?

We can assume we ran principle components analysis on our data beforehand and we’ll use these as possible confounders.

summary(cov_data[, 1:10])
##       X                  per0            per1            per2      
##  Length:10          Min.   :2.009   Min.   :2.005   Min.   :2.020  
##  Class :character   1st Qu.:2.035   1st Qu.:2.032   1st Qu.:2.040  
##  Mode  :character   Median :2.060   Median :2.063   Median :2.075  
##                     Mean   :2.067   Mean   :2.082   Mean   :2.116  
##                     3rd Qu.:2.091   3rd Qu.:2.125   3rd Qu.:2.204  
##                     Max.   :2.148   Max.   :2.229   Max.   :2.277  
##       per3            per4            per5            per6      
##  Min.   :2.007   Min.   :2.015   Min.   :2.012   Min.   :2.001  
##  1st Qu.:2.032   1st Qu.:2.018   1st Qu.:2.032   1st Qu.:2.023  
##  Median :2.068   Median :2.074   Median :2.072   Median :2.043  
##  Mean   :2.091   Mean   :2.072   Mean   :2.080   Mean   :2.062  
##  3rd Qu.:2.155   3rd Qu.:2.118   3rd Qu.:2.084   3rd Qu.:2.066  
##  Max.   :2.196   Max.   :2.139   Max.   :2.211   Max.   :2.205  
##       per7            per8      
##  Min.   :2.005   Min.   :2.007  
##  1st Qu.:2.039   1st Qu.:2.025  
##  Median :2.083   Median :2.069  
##  Mean   :2.101   Mean   :2.076  
##  3rd Qu.:2.156   3rd Qu.:2.122  
##  Max.   :2.259   Max.   :2.166

###Basic sanity Let’s double check that the order between files matches, often this is essential as silent errors can happen if they don’t.

identical(colnames(gex_data), colnames(SNP_data))
## [1] FALSE
identical(colnames(gex_data), colnames(cov_data))
## [1] TRUE
identical(rownames(gex_data), rownames(cov_data))
## [1] FALSE

Why are all these false? Is it a problem with the actual order or with the variable and row names?


###Running QTL analysis with MatrixEQTL

MatrixEQTL is an R package (and its paper) that can handle billions of comparisons efficiently. It runs regression tests on SNP data and a continuous variable such as gene expression or metabolite abundance levels. See the tutorial online and the original paper to understand more about the use of linear algebra and what each of the parameters means. Some of the data that we are using are from their examples.

We’ll set up the parameters that we need for MatrixEQTL here. I’ve commented out some of the variables we don’t need to set in this particular analysis:

useModel <- modelLINEAR # modelANOVA or modelLINEAR or modelLINEAR_CROSS

# The p-value threshold determines which gene-SNP associations are saved in the output file. 
# Note that for larger datasets the threshold should be lower. Setting the threshold to a high value for a large dataset may cause excessively large output files.
pvOutputThreshold <- 1e-5
# pvOutputThreshold.cis <- 1e-3

# Determine the distance between genes and SNPs to define cis and trans:
# cisDist <- 1e3

# Define the covariance matrix for the error term. 
# Consider an error covariance matrix if necessary (correlated variables or errors)
# This parameter is rarely used. If the covariance matrix is a multiple of identity, set it to numeric().
errorCovariance <- numeric()

###Regression analysis: effect of genetic variation on molecular changes

In the previous sections we loaded the data to inspect it. We’ll do so again here using MatrixEQTL. We could have saved that earlier step though if we had already explored our data and were now running the QTL directly. We could also have used the functions that MatrixEQTL provides instead and explored it here.

snps = SlicedData$new();
snps$fileDelimiter = "\t";      # the TAB character
snps$fileOmitCharacters = "NA"; # denote missing values;
snps$fileSkipRows = 1;          # one row of column labels
snps$fileSkipColumns = 1;       # one column of row labels
snps$fileSliceSize = 2000;      # read file in pieces of 2,000 rows
snps$LoadFile(input_SNP)
## Rows read: 1000 done.
snps
## SlicedData object. For more information type: ?SlicedData
## Number of columns: 10000 
## Number of rows: 1000 
## Data is stored in 1 slices
## Top left corner of the first slice (up to 10x10):
##      per0_per0 per1_per1 per2_per2 per3_per3 per4_per4 per5_per5 per6_per6
## snp0         2         0         1         0         1         1         1
## snp1         1         1         1         1         0         0         2
## snp2         2         1         1         2         2         2         0
## snp3         1         1         1         0         1         1         2
## snp4         2         0         1         1         1         0         1
## snp5         0         0         1         2         1         1         2
## snp6         1         2         1         2         1         1         2
## snp7         1         0         1         0         0         1         1
## snp8         1         2         1         2         1         1         1
## snp9         1         0         0         1         1         2         2
##      per7_per7 per8_per8 per9_per9
## snp0         0         2         0
## snp1         2         2         1
## snp2         1         2         1
## snp3         2         1         2
## snp4         0         1         0
## snp5         1         2         0
## snp6         1         0         1
## snp7         0         2         2
## snp8         0         1         1
## snp9         2         1         1
gene = SlicedData$new();
gene$fileDelimiter = "\t";      # the TAB character
gene$fileOmitCharacters = "NA"; # denote missing values;
gene$fileSkipRows = 1;          # one row of column labels
gene$fileSkipColumns = 1;       # one column of row labels
gene$fileSliceSize = 2000;      # read file in pieces of 2,000 rows
gene$LoadFile(input_gex);
## Rows read: 2,000
## Rows read: 2000 done.
gene
## SlicedData object. For more information type: ?SlicedData
## Number of columns: 10000 
## Number of rows: 2000 
## Data is stored in 1 slices
## Top left corner of the first slice (up to 10x10):
##          per0     per1     per2     per3     per4     per5     per6     per7
## var0 2.047919 2.157214 2.098319 2.149491 2.048774 2.096586 2.028797 2.102599
## var1 2.138952 2.008712 2.053794 2.059250 2.019757 2.100680 2.042685 2.137882
## var2 2.106723 2.008298 2.101791 2.033315 2.019592 2.078682 2.073252 2.014643
## var3 2.141172 2.062153 2.147185 2.007342 2.166873 2.119908 2.059355 2.058301
## var4 2.091915 2.159554 2.099462 2.013100 2.039233 2.037482 2.105029 2.126753
## var5 2.100446 2.107513 2.035646 2.100645 2.116844 2.183020 2.055370 2.125795
## var6 2.013193 2.059272 2.082140 2.086952 2.017216 2.224766 2.092975 2.020277
## var7 2.103222 2.076895 2.122508 2.061593 2.039443 2.112829 2.194389 2.078912
## var8 2.029294 2.114037 2.059277 2.074914 2.250834 2.232127 2.143802 2.215776
## var9 2.038085 2.024891 2.066474 2.061132 2.073689 2.134233 2.016627 2.005649
##          per8     per9
## var0 2.046070 2.053988
## var1 2.008383 2.008699
## var2 2.125122 2.009208
## var3 2.158606 2.021544
## var4 2.137508 2.074271
## var5 2.060269 2.003226
## var6 2.033903 2.162003
## var7 2.049981 2.077867
## var8 2.049508 2.201127
## var9 2.179318 2.098265
cvrt = SlicedData$new();
cvrt$fileDelimiter = "\t";      # the TAB character
cvrt$fileOmitCharacters = "NA"; # denote missing values; This is from the plink encoding.
cvrt$fileSkipRows = 1;          # one row of column labels
cvrt$fileSkipColumns = 1;       # one column of row labels
cvrt$fileSliceSize = 2000;      # read file in pieces of 2,000 rows
cvrt$LoadFile(input_covariates);
## Rows read: 10 done.
cvrt
## SlicedData object. For more information type: ?SlicedData
## Number of columns: 10000 
## Number of rows: 10 
## Data is stored in 1 slices
## Top left corner of the first slice (up to 10x10):
##          per0     per1     per2     per3     per4     per5     per6     per7
## var0 2.009330 2.228918 2.063748 2.006563 2.045015 2.085502 2.001216 2.259454
## var1 2.086642 2.127612 2.164823 2.125737 2.103674 2.069796 2.141913 2.136182
## var2 2.046203 2.118195 2.233618 2.027140 2.139107 2.210576 2.204620 2.092851
## var3 2.036938 2.017015 2.040123 2.164410 2.105893 2.035571 2.059996 2.039394
## var4 2.148413 2.032765 2.020956 2.046700 2.024658 2.073758 2.020253 2.073211
## var5 2.026485 2.031892 2.276596 2.088390 2.128583 2.012345 2.037774 2.038914
## var6 2.034747 2.074755 2.086344 2.046223 2.015175 2.030191 2.004358 2.026039
## var7 2.073489 2.130887 2.039333 2.196269 2.015702 2.077545 2.033115 2.172473
## var8 2.118229 2.051767 2.217081 2.011797 2.016148 2.183372 2.067791 2.162492
## var9 2.092472 2.005226 2.020260 2.191842 2.122189 2.024465 2.047251 2.005448
##          per8     per9
## var0 2.060060 2.030945
## var1 2.132097 2.068080
## var2 2.031814 2.025191
## var3 2.022631 2.057704
## var4 2.119389 2.016638
## var5 2.078147 2.068169
## var6 2.122696 2.037914
## var7 2.166008 2.172131
## var8 2.022996 2.232693
## var9 2.006746 2.037973
# Check files:
str(cvrt)
## Reference class 'SlicedData' [package "MatrixEQTL"] with 9 fields
##  $ dataEnv           :<environment: 0x7fb4578170e8> 
##  $ nSlices1          : int 1
##  $ rowNameSlices     :List of 1
##   ..$ : chr [1:10] "var0" "var1" "var2" "var3" ...
##  $ columnNames       : chr [1:10000] "per0" "per1" "per2" "per3" ...
##  $ fileDelimiter     : chr "\t"
##  $ fileSkipColumns   : num 1
##  $ fileSkipRows      : num 1
##  $ fileSliceSize     : num 2000
##  $ fileOmitCharacters: chr "NA"
##  and 41 methods, of which 27 are  possibly relevant:
##    Clear, Clone, ColumnSubsample, CombineInOneSlice, CreateFromMatrix, FindRow,
##    GetAllRowNames, GetNRowsInSlice, getSlice, getSliceRaw, initialize,
##    IsCombined, LoadFile, nCols, nRows, nSlices, ResliceCombined,
##    RowMatrixMultiply, RowRemoveZeroEps, RowReorder, RowReorderSimple,
##    RowStandardizeCentered, SaveFile, SetNanRowMean, setSlice, setSliceRaw,
##    show#envRefClass
str(snps)
## Reference class 'SlicedData' [package "MatrixEQTL"] with 9 fields
##  $ dataEnv           :<environment: 0x7fb4434b6910> 
##  $ nSlices1          : int 1
##  $ rowNameSlices     :List of 1
##   ..$ : chr [1:1000] "snp0" "snp1" "snp2" "snp3" ...
##  $ columnNames       : chr [1:10000] "per0_per0" "per1_per1" "per2_per2" "per3_per3" ...
##  $ fileDelimiter     : chr "\t"
##  $ fileSkipColumns   : num 1
##  $ fileSkipRows      : num 1
##  $ fileSliceSize     : num 2000
##  $ fileOmitCharacters: chr "NA"
##  and 41 methods, of which 27 are  possibly relevant:
##    Clear, Clone, ColumnSubsample, CombineInOneSlice, CreateFromMatrix, FindRow,
##    GetAllRowNames, GetNRowsInSlice, getSlice, getSliceRaw, initialize,
##    IsCombined, LoadFile, nCols, nRows, nSlices, ResliceCombined,
##    RowMatrixMultiply, RowRemoveZeroEps, RowReorder, RowReorderSimple,
##    RowStandardizeCentered, SaveFile, SetNanRowMean, setSlice, setSliceRaw,
##    show#envRefClass
str(gene)
## Reference class 'SlicedData' [package "MatrixEQTL"] with 9 fields
##  $ dataEnv           :<environment: 0x7fb4c939d500> 
##  $ nSlices1          : int 1
##  $ rowNameSlices     :List of 1
##   ..$ : chr [1:2000] "var0" "var1" "var2" "var3" ...
##  $ columnNames       : chr [1:10000] "per0" "per1" "per2" "per3" ...
##  $ fileDelimiter     : chr "\t"
##  $ fileSkipColumns   : num 1
##  $ fileSkipRows      : num 1
##  $ fileSliceSize     : num 2000
##  $ fileOmitCharacters: chr "NA"
##  and 41 methods, of which 27 are  possibly relevant:
##    Clear, Clone, ColumnSubsample, CombineInOneSlice, CreateFromMatrix, FindRow,
##    GetAllRowNames, GetNRowsInSlice, getSlice, getSliceRaw, initialize,
##    IsCombined, LoadFile, nCols, nRows, nSlices, ResliceCombined,
##    RowMatrixMultiply, RowRemoveZeroEps, RowReorder, RowReorderSimple,
##    RowStandardizeCentered, SaveFile, SetNanRowMean, setSlice, setSliceRaw,
##    show#envRefClass
dim(cvrt)
## [1]    10 10000
dim(snps)
## [1]  1000 10000
dim(gene)
## [1]  2000 10000

Call MatrixEQTL’s main function:

results_mxqtl <- Matrix_eQTL_main(
  snps = snps,
  gene = gene,
  cvrt = cvrt,
  output_file_name = output_file_name,
  # output_file_name.cis = output_file_name.cis,
  useModel = useModel,
  errorCovariance = errorCovariance,
  verbose = TRUE,
  pvalue.hist = 'qqplot',
  min.pv.by.genesnp = FALSE,
  noFDRsaveMemory = FALSE,
  pvOutputThreshold = pvOutputThreshold,
  # pvOutputThreshold.cis = pvOutputThreshold.cis,
  # snpspos = snpsPos_data,
  # genepos = probePos_data,
  # cisDist = cisDist
  )
## Processing covariates
## Task finished in 0.009 seconds
## Processing gene expression data (imputation, residualization)
## Task finished in 0.957 seconds
## Creating output file(s)
## Task finished in 0.012 seconds
## Performing eQTL analysis
## 100.00% done, 21 eQTLs
## Task finished in 15.624 seconds
## 

We can inspect the results now. Each significant gene-SNP association is recorded in a separate line in the output file and in the returned object results_mxqtl. In case of cis/trans eQTL analysis described below, two output files are produced, one with cis-eQTLs, another only with trans. Every record contains a SNP name, a transcript name, estimate of the effect size, t- or F-statistic, p-value, and FDR.

show(results_mxqtl$all$eqtls) # The output will be NULL if we specified a cis distance
##      snps    gene statistic       pvalue       FDR         beta
## 1  snp948 var1188  4.864856 1.162899e-06 0.8848146  0.004137290
## 2  snp353 var1143 -4.826843 1.407705e-06 0.8848146 -0.004095553
## 3  snp148  var463  4.742019 2.145269e-06 0.8848146  0.004014038
## 4  snp904 var1125  4.698051 2.661618e-06 0.8848146  0.004043992
## 5  snp613 var1586  4.693933 2.715676e-06 0.8848146  0.004031502
## 6  snp118 var1235 -4.693783 2.717664e-06 0.8848146 -0.004034465
## 7  snp780 var1212  4.611989 4.037927e-06 0.8848146  0.003977466
## 8  snp599  var610  4.606729 4.141159e-06 0.8848146  0.003899642
## 9  snp646 var1076 -4.598049 4.317045e-06 0.8848146 -0.003880063
## 10 snp381 var1488  4.592933 4.424073e-06 0.8848146  0.003897506
## 11 snp647  var548  4.560155 5.172410e-06 0.9188292  0.003906009
## 12 snp141  var731  4.546515 5.518310e-06 0.9188292  0.003829261
## 13  snp69 var1761 -4.499408 6.891324e-06 0.9188292 -0.003855991
## 14 snp216  var626  4.499003 6.904423e-06 0.9188292  0.003883072
## 15 snp222  var175  4.483492 7.425009e-06 0.9188292  0.003864421
## 16 snp215  var702  4.459259 8.314165e-06 0.9188292  0.003841642
## 17 snp982  var747 -4.456768 8.411126e-06 0.9188292 -0.003864064
## 18 snp200  var518 -4.452391 8.584136e-06 0.9188292 -0.003745085
## 19 snp991  var973 -4.436626 9.235856e-06 0.9188292 -0.003798167
## 20 snp295  var299  4.431270 9.467834e-06 0.9188292  0.003812446
## 21 snp638 var1889 -4.427203 9.647706e-06 0.9188292 -0.003805242
head(results_mxqtl$all$eqtls)
plot(results_mxqtl)

head(results_mxqtl$cis$eqtls)
## NULL
nrow(results_mxqtl$cis$eqtls)
## NULL
head(results_mxqtl$cis$ntests)
## NULL
head(results_mxqtl$trans$eqtls)
## NULL
nrow(results_mxqtl$trans$eqtls)
## NULL
head(results_mxqtl$trans$ntests)
## NULL

###Discussion of results

Next steps

What could we do next? Annotation, downstream analysis, plots for publication, follow-up of significant results.

###The end

Questions? Corrections? Comments?


##A few references

R Markdown and Notebook:

Markdown language:

QTLs:

Programming with R best practice:

We use Software Carpentry materials for some of the examples and code. Please take a look at their webpage and lessons if you are interested.